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ASYMPTOTIC  PROPERTIES  OF  STOCHASTIC  APPROXIMATIONS 
WITH  CONSTANT  COEFFICIENTS 


Harold  J.  Kushner  and  Hai  Huang 

Abstract :  Asymptotic  properties  (as  a  0,  n  ->  “>)  of  the  Stochastic 
Approximation  CSA)  algorithm 

(*)  Vl  ' 

are  obtained,  where  h  is  not  necessarily  additive  in  5  .  If 

n 

Eh(x,5^)  =  gCx)  and  x  =  g(x)  is  asymptotically  stable  about  a 
solution  x^  =  6,  then  the  asymptotic  properties  of  {(Xj^-0)//a}  i 
{U^}  are  developed.  In  particular,  it  is  shown  that  (as  a  -*■  0)  a  natural 

a 

continuous  parameter  interpolation  of  converges  weakly  to  a 

linear  diffusion  process,  from  which  the  asymptotic  properties  of 

{U  }  and  {X  }  for  small  a  can  be  obtained.  The  conditions  on 
n  n 

are  reasonable  from  the  point  of  view  of  the  usual  applications 
to  adaptive  systems  and  identification.  These  results  seem  to  be 
the  first  of  their  type  for  SA's  with  constant  coefficients.  Some 
rate  of  convergence  results  for  classical  SA's  are  improved.  Also, 
an  application  of  (*)  to  a  problem  of  tracking  the  time  varying 
parameters  of  a  linear  system  is  discussed,  and  a  limit  theorem 
obtained.  Because  in  the  usual  practical  implementations  of  SA  to 
problems  in  systems  theory,  the  gain  sequence  does  not 

normally  go  to  zero  (due  to  considerations  of  robustness  and 
non-stationarities) ,  these  results  are  of  particular  importance. 


ASYMPTOTIC  PROPERTIES  OF  STOCHASTIC  APPROXIMATIONS 
WITH  CONSTANT  COEFFICIENTS 


1.  Introduction 

In  [1]  rates  of  convergence  for  stochastic  approximations 
(SA)  of  the  type 

were  treated,  where  {2^}  is  a  sequence  of  positive  numbers 
tending  to  zero  and  such  that  Z  a^  =  “,  and  is  a  sequence 

of  random  variables.  In  particular,  we  used  a^  =  A/(n+l)°^, 

0  <  ot  <  1,  although  the  proofs  could  have  been  adapted  to  deal 
with  more  general  sequences.  As  has  been  usual  in  rate  of  con¬ 
vergence  studies  for  SA*s,  it  was  assumed  that  there  is  a  vector 
6  such  that  ^  ^  w.p.l,  and  that  {5^}  is  a  stationary 
sequence.  Unlike  previous  works  on  the  rate  of  convergence 
problem,  [1]  did  not  assume  that  h  is  additive  in  the 

additivity  assumption  is  not  satisfied  by  many  important  applica¬ 
tions  in  systems  theory. 

In  this  paper,  we  obtain  analogous  results  concerning 
asymptotic  behavior  and  rate  of  convergence  for  the  case  where 
»  a,  a  small  constant.  The  algorithm  will  be  written  in  the 
form  (1.2),  where  f  and  g  are  measurable  functions , further  properties 
of  which  will  be  given  below. 
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(1.2) 


-  agCx;;)  .  af(X^.^^) 

independent  of  a 


Algorithms  of  the  type  (1.2)  are  particularly  important  in  applica¬ 
tions  to  both  identification  theory  and  adaptive  systems  theory, 
and  for  a  version  of  this  problem,  the  results  are  both  specialized 

and  extended  in  Sections  6  and  7;  in  Section  7  (5  }  is  non- 

n 

stationary,  and  the  "parameter  tracking  problem"  is  dealt  with.  In 

such  applications,  h  is  not  additive  in  the  noise  and  the 

may  not  be  a  stationary  sequence.  Furthermore,  in 

engineering  practice  there  is  usually  a  constant  a  >  0  such 

that  either  {a  }  tends  to  a  or  else  that  a  =  a,  although 
n  n 

almost  all  the  existing  analysis  of  (1.1),  11.2)  (indeed  of  all  SA 
methods  [2],  [3],  [4])  assume  a^  0.  The  case  (1.2)  is  more 
robust  than  (1.1)  in  the  sense  that  it  can  better  accommodate 
non-stationarities  and  modelling  errors,  and  it  is  often  the  form 
used  in  applications. 

In  general,  little  is  known  about  the  sequence  (1.2). 
Normally,  (X^)  does  not  converge  w.p.l,  and  if  is  non¬ 

stationary  it  may  not  even  converge  in  distribution.  Under 
various  assumptions,  (1.3)  (a  specialization  of  (1.2))  has  been 
treated  in  the  adaptive  process  literature.  Here  B  is  a  vector 
valued  bilinear  form  and  A,C  are  matrices  (Widrow  et  al  [5] , 

Senne  [6],  Davisson  [7]). 


"^1  =  ^n  *  "B(X;;,5^)  .  aC5^  *  aAX^. 


(1.3) 
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I  cL  I  2 

Results  such  as  lim  E  X  -^0  as  a  -*■  0  were  obtained.  Our 

n 

method  works  under  broader  conditions  (we  assume  bounded  noise; 
for  a  form  of  (1.3),  [7]  dealt  with  unbounded  but  m-dependent 
Gaussian  noise  and  used  a  trick  similar  to  the  concatenation  of 
m  steps  into  one  and  then  exploitation  of  a  result  for  in¬ 
dependent  {C^})and  yields  a  much  more  complete  picture  of  the 
process  behavior.  As  in  [1],  [3],  weak  convergence  methods  are 
used. 

Define  =  (X^-0)/i/a  and  t^  =  an.  Let  be  a 

sequence  which  goes  to  “  as  a  -*•  0,  and  define  the  piecewise 

di  3 

constant  continuous  parameter  process  U  (•)  by  U  (0)  = 

,  U^(t)  =  in  [na,(n+l)a).  We  prove  that  {U^(-)} 

3.  3 

converges  weakly  to  the  Gaussian  diffusion  (5.1)  as  a  -►  0,  where 
R  is  defined  below  (5.1)  and  H  •  The  results  yield 

stability  of  the  process  (1.2)  for  small  a,  together  with  the 
asymptotic  (as  a  -►  0)  error  variances  and  correlation  functions 
(of  U^(.)).  It  seems  to  us  that  the  general  approach  is  quite 
straightforward  and  relatively  easy  to  use.  The  weak  convergence 
and  stability  ideas  yield  a  lot  of  intuitive  insight  into  the 
relations  between  the  structure  of  an  algorithm  and  its  asymptotic 
properties.  Since  it  makes  no  sense  to  assume  convergence 

-*■  some  9  a  priori,  some  stability  analysis  is  needed.  For 
the  special  adaptive  process  case  when  (1.2)  reduces  to  (1.3), 
situation  is  simpler,  and  we  obtain  better  results  in 
- ..  xun  7  . 

In  Section  2,  assumptions  for  the  general  problem  are 

stated.  Tightness  of  (U®,  n  >  N  }  is  obtained  in  Section  3. 

11  *  » 
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Section  4  contains  some  remarks  concerning  special  cases,  and 
on  the  use  of  the  methods  of  this  paper  to  extend  known  convergence 
and  rate  of  convergence  results  for  SA's  of  the  type  (1.1)  when 
the  SA  sequence  converges  in  probability  rather  than  (the  usual 
assumption)  almost  surely. 

The  main  limit  theorem  is  given  in  Section  5,  and  Section  7 
treats  the  special  case  (1.3)  when  the  statistics  of 
time  varying  and  both  H  and  R  are  functions  of  time.  Some 
of  the  arguments  are  similar  to  those  in  [1] ,  and  we  formulate 
the  problem  here  so  as  to  use  the  earlier  results  whenever 
possible. 

2.  Assumptions  for  Section  3 

K  denotes  an  arbitrary  real  number  (independent  of 
x,C,n,a)  and  its  value  may  change  from  usage  to  usage,  ^xx^*^ 
denotes  the  Hessian  matrix  of  a  function  G  and  denotes 

conditioning  on  5^,  i  <  n. 

Remarks  on  the  assumptions.  In  order  to  get  rate  results 
(i.e.,  limit  results  for  U^(*)  or  U^,  as  a  -►  0,  n  -*■<»)  we 
obviously  require  that  the  tails  of  converge  in  some  sense 

as  a  -►  0.  This  requires  some  stability  properties  of  the 
"deterministic"  part  of  (1.2),  in  particular  that  a  solution 
x^  =  constant  =9  of  the  ODE  x  *  g(x)  is  globally  asymptotically 
stable.  For  notational  convenience  in  Section  3  and  in  the 
assumptions,  we  set  0=0  there,  without  loss  of  generality. 

In  Section  4  on,  we  reintroduce  0.  It  seems  best  to  deal  with 
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the  stability  problem  by  introducing  a  Liapunov  function  V(") 
for  X  =  g(x).  Conditions  (A6)-(A7)  below  are  often  guaranteed  by 
various  forms  of  strong  mixing  conditions  on  •  In  the  usual 
applications  to  identification  and  adaptive  systems  theory  [5],  [8], 
there  is  an  asymptotically  stable  A  such  that  g(x)  =  Ax  and 
an  affine  function  f(*)  such  that  f(x,C)  =  f(x)C.  Then  V(-) 
is  chosen  to  be  a  quadratic  form  and  (A4)-(A5)  hold,  and  so  do 
(A6)-(A7)  under  simple  conditions  on  See  Section  7  for  more  detail. 

o 

A1 .  For  each  a,  is  a  bounded  random  sequence  and 

Ef(x,S^)  =  0,  all  x,a,n. 

A2.  g(9)  =  0,  0  =  0  (here  and  in  Section  3.  for  no- 

tational  convenience  onlyl  gCO  and  fC*,*)  are  measurable. 

The  first  and  second  partial  x-der i vatives  of  f(‘,C)  and  g(*)  are 
continuous  for  each  C . 


A3 .  There  is  a  non-negative  three  times  continuous  differ¬ 
entiable  Liapunov  function  V(*)  for  x  =  g(x)  such  that 
VCx)  >  0,  V(x)  ->•  oo  ^  |x|  -►<*>,  V(x)  =  x'Qx  +  o(lx|^)  for  some 

positive  definite  matrix  Q . 


A4.  For  some  real  Y  >  0,  V^(x)g(x)  <  -YV(x). 

AS.  is  uniformly  bounded  and  |f(x,C)| 

|g(x)|^  <  K(V(x)+l). 

00 

A6.  I  a|EV(x)f(x,5f)  I  <  aK(V(x)  +  l). 
i=n  "  ^  ^ 


2 


+ 
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A7.  I  a|E^(V^(x)f(x,4^))^^|  <  aK, 
i=n 
00 

I  a|E^(  V'(x)f(x,Cf))^i  <  aK(V^/2(x)+l). 
i=n  "  ^  ^ 

(A5)  implies  that  f  and  g  grow  at  most  linearly  in  x. 

3.  Tightness  of  {U^  small  a,  n  >  N  } 

_  n  -  s 

Fix  K„  >  0.  Let  N  denote  any  integer  such  that 

U  3 

exp(- (aY/2)N^)  <  K^a.  We  have  the  n  >  requirement  because 
of  the  effect  of  the  initial  condition.  In  general,  {X^//a,  n  >  0, 
small  a}  will  not  be  tight  unless  Xq  =  0.  So  we  wait 
(N  steps)  until  the  effects  of  the  initial  condition  are  small. 

3 

Q 

In  any  case,  we  are  concerned  with  the  tail  of  for 

small  a.  For  the  special  case  (1.3),  it  is  possible  to  center 
the  sequence  such  a  way  that  n  >  0}  can  be 

dealt  with  (then  =  0  is  used)  and  a  better  result  obtained. 

See  Section  7. 

Theorem  1.  Under  (A1)-(A7),  {uf,  small  a,  n  >  N_}  is 

11  '  ”  3  ' 

tight . 


Proof.  Again,  K  defines  a  constant  whose  value  may  change 
from  usage  to  usage.  Define  (well-defined  by  (A6) ,  recall  t^  =  an). 

00 

(3.1)  Vj<^x,t^)  =  a  J^E^V^(x)fCx,5i) 


and  define 
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(3.2)  V^(x,t^)  =  V(x)  +  Vj(x,t^). 


The  proof  uses  a  Liapunov  function  approach  with  Liapunov  function 
V^.  The  reason  for  the  introduction  of  the  term  will  be 

clear  below;  basically,  it  is  useful  owing  to  the  non- independence 
of  the  allows  us  to  "average"  out  their  effects.  In- 

deed,  =  0  when  the  independent.  We  first  evaluate 


where 


■^1  ■ 

-  EX(X^Vl)  -  vJ(X^t„) 

■^3  ■  "X<Cl-Vl>  - 

Let  and  denote  random  variables  in  the  range 

3  3 

IXj^,X^^j^]  .  Then,  via  truncated  Taylor  series  expansions 

Ti  -  av;(x^)g(x^)  .  aV’(X^)t(x“,5^) 

l2  =  -av;(x“)f(x“.«^) 
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i=n+l 

*  1^  .  J.,^^f(>'^«n)  *  8Cxi;))MVi(x;‘)f(x;*,5“)l^^(f(X>.t„) 
i*n+l 

-  g(X^)). 


Now,  (A4)-(A7)  yield  (note  that  T2  cancels  the  second  term 
of  Tj^;  this  is  the  reason  for  the  introduction  of  V®) 


(3.3a)  E^V^C^n+l’ Vl^  '  -  -aYV(X^)  +  a^K tV(X^) +  1]  , 


By  (A6),  |V®(x,t^)|  <  aK(V(x)+l)  and  by  (3.3a) 


(3.3b)  E^v"(X^,,,t^,,)  -  V^X^.t^)  <  -aYV^X^.t^) 


a  ,„a 


+  a2K[V^(X^,t^)+l]. 


Let  a  K  <  ay/2  (or,  equivalently  a  <  a^  =  y/2K) .  Then  (3.3b) 
yields 


(3.4)  Ejv®(X®,t^)  <  exp(-aYn/2)V^(xJ,0)  +  Ka 


a,„a 


Equation  (3.4)  also  holds  for  V  replacing  V®.  Thus,  by 
(3.4)  and  (A3),  for  any  constant  K,  and  n  >  N  ,  a  <  a^. 

X  *  “  0 
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C3.5) 


K(exp-aYN  /2) (V(X^)^ll  ♦  Ka 
k,a}  <  ^  ^ 


<  KAj^. 


Tightness  of  {U^  small  a,  n  >  N^}  follows  from  (3.5) 
in  the  following  way.  Fix  6  >  0.  To  get  tightness  we  need  a 

x®'qx" 

kr  <  «>  such  that  ^  >  k.}  <  6,  all  a  <  a„,  n  >  N  . 

0  a-o-’  -0  -a 

2 

There  is  an  ^  ®  such  that  for  x'Qx  <  e^,  lo(|x|  )|  < 
x*Qx/2.  For  each  real  kj  >  0,  there  is  a  k^Ckj)  >  0  such 
that  x'Qx  >  kj  implies  V(x)  >  k^(kj)  and  we  can  choose  k^ ( • ) 
to  be  a  monotonic  function. 

Let  n  >  N  .  By  (3.5)  (recall  that  K  might  have  a  different 
value  in  each  usage) . 

P{X^'QX®/2a  >  kj)  <  K/k^  +  P^^n'Q^n  -  ^0^ 


<  K/kj  +  P{V(X®)  >  <  K/k^  *  Ka/k4(eo). 

Choose  kj^  such  that  K/kj^  =  6/2.  If  a  <  a  e  6k4(eQ)/2K,  then 
the  right  hand  side  is  <6.  If  a^  >  a  >  a,  note  that  for  any 
k  >  0 

x^'qx^  x^'qx^ 

P{  >  k}  <  P{— !1— >  k}  <  P{V(X^)  >  k^dk)} 

<  Ka/k^(ak)  <  KaQ/k^(ak). 


Now  choose  k2  such  that  KaQ/k^(ak2)  5  6.  Finally,  let 
k^  =  max(kj^,k2).  Q.E.D. 
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4.  Remarks 

(i)  In  a  practical  implementation  of  the  algorithm  (1.1) 
a^  might  not  be  chosen  to  be  constant,  but  might  be  allowed  to 
decrease  to  some  value  a  >  0  by  iteration  number  N  ,  where 

cL 

N  will  be  chosen  such  that  ElX^I^  t  Ka,  and  a„  will  remain 

at  value  a  thereafter.  Under  our  conditions,  we  can,  in  fact, 

prove  that  tight.  But  if  we  are  only  interested  in 

the  "tail"  of  {X^},  we  can  often  assume  that  the  initial  condition 

n 

error  is  commensurate  with  the  value  of  a  (i.e.,  E|Xq|^  <  Ka) . 

We  might  also  be  more  concerned  with  the  ability  of  the  algorithm 
to  track  changes  (e.g.,  the  changing  system  parameters  in  the 
identification  example  (Section  7),  than  with  the  transient  errors. 
Then  we  only  need  look  at  the  "errors"  for  large  n  (say, 

n  >  N„)  once  the  transient  errors  due  to  the  initial  condition 

“  cl 

have  been  "dissipated". 

(ii)  Stochastic  approximation  (1.2)  with  a^^  -»•  0.  Again, 
suppose  without  loss  of  generality  that  the  origin  is  the  unique 
asymptotically  stable  point  of  ic  =  g(x).  Let  a^^  =  A/(n+l)°^, 

01  e  (0,1].  Then  the  method  of  Theorem  1  can  be  used  to  show 
tightness  of  n  >  0}  ,  without  the  (usually  required) 

assumption  that  X^  -*■  0  w.p.l.  To  do  this  we  first  define 

t  =  la.  and  V^(x,t  )  =  I  a .E  V' (x) f (x,5 . ) ,  where  E 

denotes  the  expectation  conditioned  on  i  <  n.  Then  usider 

(A1)-(A5)  and  obvious  analogs  of  (A6)-(A7)  (the  a  under  the 
summation  is  replaced  by  a^^  and  that  on  the  right  hand  side  is 
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replaced  by  a  1  tX„//a~,  n  >  0}  can  be  shown  to  be  tight. 

^  n  n  n  — 

Set  V®(x,tj^)  '  V(x)  +  Vj^(x,tj^). 

In  order  to  prove  the  tightness,  we  derive  the  inequality  (via 
the  method  of  Theorem  1) . 


v“(X„,t„)  <  -ya„V(X„)  *  a2K(V(X„)  *  11 


and  then  continue  according  to  the  scheme  in  Theorem  1  using  (the 
analog  of  (3.4)) 


Ev\x  ,t  )  < 
n  n  • 


0  n- 1  n- 1 

(exp-Yt  /2]V  (Xn,0)  +  K  I  TT 
"  ”  i=0  j=i+l 


a.  .  Ka23^2^ 


and  then  show  that  the  above  right  side  is  bounded  above  by  Ka  . 

'  n 

This  result  is  important  because  the  proof  of  tightness  of 
{Xn/»^^}  is  the  ba  sic  problem  in  rate  of  convergence  results  for 
stochastic  approximations.  If  tightness  of  {Xj^//a~}  is  known, 
then  the  rate  of  convergence  proofs  in  [1] ,  [3] ,  [9]  all  go  through 
with  virtually  no  changes  without  using  the  assumption  that 


X  -►  0  E  0  w.p.l. 
n 


(iii)  Stochastic  approximation,  additive  noise.  Continue 
with  the  situation  in  the  last  paragraph,  but  let  f(x,5)  =  i,  the 
classical  Robbins-Monro  case.  Then  (A6)-(A7)  are  particularly 
simple.  There  are  adaptations  to  the  Kiefer-Wolfowitz  case,  where 
c^  =  C/(i+l)^,  a^  =  A/(i+l)“,  2Y<a,  Y  >  0,  and  {c^}  is  the 
finite  difference  coefficient  sequence.  Then  the  normalizing 
sequence  is  rather  than  o*" 
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5.  The  Main  Rate  of  Convergence  Result 

In  this  section,  we  let  0  rather  than  0  denote  the 
stable  point  of  x  =  g(x),  and  introduce  the  additional  assumptions 
(A8)-(A12)  below.  Thus,  we  use  U®  =  6X^//a,  where  =  (X®-e). 

For  each  a  >  0,  define  the  process  U^C-)  by  U^(0)  =  , 

and  for  each  integer  i,  U^(t)  *  in  [ia,ia+a).  We  will 

show  that  U®(*)  converges  weakly  in  to  the  solution 

to  the  Gauss-Markov  process  U(*): 

(5.1)  dU  =  HUdt  +  R^/^dB,  U(0)  =  weak  limit  of  {U®(0)}, 

where  H  =  g^(0),  B(0)  is  a  standard  Wiener  process  and  R  is 
defined  by  (see  (A9)  below) 

00 

R  =  lim  I  R^(i), 
a-*-0  -00 

where 

R®(i)  =  Ef(0,C®)f(0,Cj^i). 

Also,  if  aN  -►00  as  a  -►  0,  then  the  weak  limit  of  {U^(0} 

a 

is  the  stationary  solution  to  (5.1). 

Suppose  that  U^(*)  does  not  converge  weakly  to  U(*)  in 
d’^[0,oo).  Then  there  is  a  sequence  {aj^}  of  positive  numbers 
which  goes  to  zero  as  fast  as  we  wish  and  a  T  <  oo  such  that 
U  (•)  does  not  converge  weakly  to  U(‘)  in  D^[0,T].  Thus, 
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k  r 

it  suffices  to  show  convergence  of  U  (•)  in  D  [0,T1  for  an 
arbitrary  T,  and  for  a  sequence  which  goes  to  zero  fast 

enough,  but  is  otherwise  arbitrary.  We  will  set  the  problem  up 
similarly  to  the  way  it  was  set  up  in  [1] ,  so  that  the  results 
of  that  reference  can  be  used  whenever  possible. 

The  following  assumptions  are  required  (analogous  to  (A3a) 
of  [1]).  Define  m  (t)  =  max{i:  ai  <  t}.  After  stating  the 
conditions,  we  comment  on  their  reasonableness. 


A8 .  There  is  a  Tj^  >  0  such  that 


ma(tN*t)-l 

P{  max  a|  I  f  (Q,C®)|  >  e) 
0<t<T^  i=m^(tj^) 


ka(^) 


as  a  -►  0,  for  each  e  >  0,  uniformly  in  N. 


A9.  Define  h-  =  f(0,C-).  Then  {h-}  is  stationary  for 
j  j  J  00 

each  a.  Define  R^(i)  =  Eh^(h^^.)'.  Then  =  I  R^(i)  is 

J  1  ^  _oo 

absolutely  summable  and  the  sum  converges  uniformly  in  a .  There 
is  a  matrix  R  such  that  R^  ->  R  as  a  -*■  0. 


AID.  Define  P®(i)  =  sup  E^/^lE^h®  .h®’  -  R®(il)l^ 

j  ,£>0  J  J  1  J  1  x- 

oo  1/7 

I^(P^(i))  '  <  where  the  sum  converges  uniformly  in  a. 

All.  Define  P^Ci)  =  sup  E^^^ | Efhf . . | ^ .  i  >  0.  Then 

Z  K  K+l  - 

1  <  “»  where  the  sum  converges  uniformly  in  a. 

i  =  0 


Then 


A12.  ^ 


Remarks  on  CA8) -  CM 2) .  The  conditions  do  not  seem  to  be 


particularly  strong.  Except  for  the  boundedness  of  they 

are  basically  the  conditions  used  in  [1] ,  adapted  to  the  present 
case. 

Let  {5^}  be  a  <|>-mixing  process  in  the  sense  of  [10]  with 
X  where  does  not  depend  on  a.  Then  (A9)-(A11) 

hold.  Condition  (A3b)  of  [1]  always  holds  if  the  noise 
bounded  (set  t  =  0  there). 

Define  =  f  let  there  be  an  R.  such  that 

J  X  j  1 

lEkjkj^^l  <  R^  for  all  j  and  (small)  a  >  0,  and  define 

R  *  X  R. .  Then  by  a  Mensov-Rademacher  type  estimate  ([3],  p.  98), 
i  ^  _ 

there  is  a  K  (depending  on  R)  such  that  for  each  Tj^  >  0 

maCtN+t)-l 

™  ?,  1  108^I"a(VTl) 

t<Ti 

- 

<  T^K  a  log24T^/a  <  K^a  log^a 

which  implies  (A8) .  Other  examples  satisfying  (A8)  appear  in  [3]. 

Theorem  2.  Assume  (Al)-(A12).  Then  {U^(*)}  converges 
weakly  in  D^[0,“)  to  the  U(*)  of  (5.1).  If  N_  is  such  that 
aNa  -►  *,  then  U(0)  has  the  stationary  distribution  of  U(t). 

Proof.  Fix  T  >  0.  Let  denote  sequences  of 

positive  numbers  such  that  X  a,  -►  0,  and  (see  (A8)) 

i  1  K 
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(5.2)  I  k  (e.)  <  oo. 

k  k 

If  (A8)  holds  for  some  then  it  holds  for  all  ,  so  we  can  suppose 

that  =  T.  By  the  discussion  at  the  beginning  of  the  section 

.  . 

It  is  enough  to  prove  the  theorem  for  {U  (•))• 

a  a  a  N+n-1  ^ 

Part  1.  Define  /a  ff6.it)  =  <5W7,  and  W;,  _  =  1  and 

J  J  N  ,n  j  =I\j  J 

let  W®(‘)  denote  the  function  on  [0,T]  which  equals 

V^N  n  [an,an+a).  By  a  truncated  Taylor  series  expansion 

3i 

+  af(e,i^)  +  aB^(G(X*),6X^)6X^ 

H  [I+aH^]6X^  +  af(e,i^)  +  aY^  , 

^  di 

where  B, (G(X„) , 6X^)  is  a  matrix  valued  bilinear  form  in 
1  n  n 

GC^n)  and  6X^,  and  the  elements  of  GfX^^)  are  components  of 
the  second  derivatives  of  f(x,i^)  +  gCx)  evaluated  at  some  point 
in  the  interval  [6,X^].  H®  is  defined  in  the  obvious  manner. 
Thus 

U^.,  =  [I+aH^]U^  +  6W^  +  /a  Y®. 

n+i  n  n  n  n 


(5.3) 
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Define  F 


N,n 


N+n-1 

'i  /a  y“  and  let  F^f-)  denote  the 
j=N 


n 


function  on  [0,T]  which  equals  F^  on  [an,an+a) . 

'3 


ot  N 

Define  the  function  C(^(a)  by  ^ 


n  >  il  +  1 , 


fal 


N+n 

I  [I^aHf]  =  (I+aHS,„) 

j=N+i!,+l  J 


N+n^  •••  + 


By  iterating  (5.3)  and  doing  a  summation  by  parts,  we  get 
(5.4),  just  as  (3.6)  of  [11  was  obtained. 


U 


N+n+1  ~  S  ^N+l^\,n+l  ^N,n+1^ 


(5.4) 


n 


^j^^^N+t+l^^^^N+jl^^''^N,n+l  ■  ^^N,n+1 


Part  2.  We  now  argue  that 


(5.5) 


^-(t^+t+s) 

^mJVs)  lO'TJ 


uniformly  w.p.l,  as  k  -►  ®,  for  any  fixed  N  or  sequence  N  -►  * 
as  k  -*■  <*>.  The  limit  result  (5.5)  follows  from  [1],  Lemma  2 , when 
we  make  the  following  identification  of  our  with  the 

in  [1],  Lemma  2.  To  avoid  confusion  write  the  {a^^}  of  [1]  as  (aj^l 
Then  set  the  first  [T/aj^l  of  the  {a^}  equal  to  our  aj^,  the 
next  T/^2  ^®n^  equal  to  our  a2,  etc.  Then  (A8) , 

(5.2)  and  the  Borel-Cantelli  Lemma  imply  (A3)  of  [1],  hence 
also  its  Lemma  2  and  (5.5). 
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Part  3.  As  in  [1],  Theorem  2,  (A9-A11)  imply  that  {W  ^(O) 

converges  weakly  to  a  Wiener  process  W(*)  with  infinitesimal  co¬ 
l/2 

variance  R;  i.e.,  W(t)=R  B(t),  where  B(*)  is  a  standard  Wiener 
process . 

(5.6)  {T  (•)  }  converges  weakly  to  the  zero  process  as 

k  , 


then  the  proof  is  completed  via  the  arguments  of  [1],  Theorem  2, 
part  3,  and  we  will  only  prove  the  desired  weak  convergence  (5.6). 
The  purpose  of  the  argument  in  [1],  Theorem  2,  part  3,  is  simply  to 
show  that  the  in  (5.4)  can  be  replaced  by  With  this 

replacement  and  the  convergence  of  {U^}  and  {W^(-),  r®(-)}. 

(5.1)  follows  from  (5.4)  and  (5.5). 

Let  Mj^  denote  [T/aj^]  ,  and  .  In  view  of  the 

a  ^k 

properties  of  (Yp,  (5.6)  holds  if  (5.7)  does 

Nk+Mk-1 

(5.7)  ^  I  I  ^  >  e}  0  as  k  each  e  >  0. 

i-N^ 


If  V(0>  the  Liapunov  function  of  Theorem  1,  is  quadratic,  then* 

2 

<  Kaj^  by  Theorem  1,  and  (5.7)  holds  by  an  application  of 
k 

Chebychev's  inequality.  We  now  prove  it  in  the  general  case. 

Recall  from  Theorem  1  that  there  is  a  K  such  that  for 


n  >  N  and  the  V‘ 
a 


of  Theorem  1 , 


^Recall  the  criterion  for 
Section  3. 


N  given  in  the  first  sentence  of 
®k 
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EV(X^)  <  Ka.  lEV^(X^,t^)|  5  Ka, 


(5.8)  V^Cx.t^)  >  -Ka, 


^  (l-Ya.Ka^)V“(X;;,t^)  ^  a^K 


2  ^  ..a  /■  va 


Let  K  be  fixed  at  its  above  value  henceforth  in  this  proof, 
and  let  n  >  N  and  n  -  N  <  T/a  and  let  'a'  be  small  enough 

—  3  ^ 


such  that 


-Ya  +  Ka'^  <0,  Ka<l,  a<l. 


Define  the  random  variables  L  by 


L®(X^,n)  =  V®(X^,t^)  +  aK  +  CT-aCn-N^))a^  K. 


Then  L®(X^,n)  >  0.  By  (5.8),  we  have 


E®L^(X^^l,n+l)  <  (l-Ya+Ka2)V®(X^,t^)  +  aK 


(5.9a) 


+  (T+aN  -Cn+l)a)Ka^/'^  +  Ka^, 


and,  consequently. 


-  L®(X^,n)  < 


(5.9b) 


<  (-Ya+Ka^)V®(X®,t  )  +  Ka^  -  Ka^^^  <  0. 


Thus  {L^(X^,n)}  is  a  non-negative  supermartingale  for  each 


small  a.  Thus,  there  is  a  real  Kj^  such  that 
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C5.10) 


P{  sup  L^(X%i)  >  <  EL^(X^  ,N,)/a^/® 

Nk<i<Nk+Mk-l  ■  '^k 


K(a+a^/^)/a^/®  =  0(a^''®)  . 


There  is  a  K2  <  “  such  that  if  L^(X^,n)  <  a^^^,  then 

V(X^)  <  K2a^'^®.  We  can  suppose  that  a  is  small  enough  so  that 
V(x)  <  K2a^^®  implies  that  V(x)  >  x'Qx/2,  and  V^(x,n)  > 
x'Qx/2  -  aK.  Then,  for  small  a  and  L^(X^,n)  < 


(5.11) 


0  5  L^(X^,n)  =  0(a^/^)  +  6^, 


where  6^  >  (X^)'QX^/2.  Equation  (5.7)  follows  from  (5.10)  and 
(5.11),  since  there  is  a  real  such  that  with  probability 

1  -  OCa^/®), 

Nk+Mk-1 

^  I  (X?)'QXV2  <  /a  (T/a)(K  a^/®)  =  0(a^/®).  Q.E.D. 
i-Nk  ^  ^  - 


Adaptive  Systems  -  Examples 


We  will  describe  very  briefly  two  of  the  more  important 
systems  which  fall  into  our  framework.  Let  (’^n’^n^  ^^n  ^ 

denote  the  input  and  output  sequences,  resp.,  of  the  linear  system 


(6.1) 


■ 


*  ^ic^n-k'  *“’0%  *  • 


•  *  *  % 


Suppose  that  the  system  is  asymptotically  stable  when  u^  =  0,  =  0. 

Define 
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and  let  be  a  zero  mean  random  sequence  which  is  independent 
of  the  zero  mean  sequence  ^  common  algorithm  for  estimating 
0  is 


(6.2) 


'n+1 


n  n 


W^n' 


where  X  is  the  n^^  estimate  of  6.  Under  various  conditions 
n 

(including  a^^  -►  0)  ^  ^  w.p.l  [2],  [3].  In  practice,  due  to 

extraneous  noise,  robustness  considerations,  or  model  uncertainties, 
it  is  common  for  either  a^  +  a  >  0  or  a^  =  a,  a  constant,  per¬ 
haps  a  matrix.  The  case  where  6  varies  with  time  and  a^  =  a 
is  dealt  with  in  detail  in  the  next  section. 

Next,  consider  a  similar  algorithm  which  is  very  useful  in 
adaptive  communications  theory.  Let  i  =  1,2  and 

(Nni),  i  =  1,2,  represent  stationary  signal  and  noise  sequences, 
resp.  and  (Sj^2^  ^^^nl^  ^^n2^’  resp.)  are  related 

in  the  sense  that  they  are  signal  (noise,  resp.)  processes  appear¬ 
ing  at  the  inputs  to  different  antennas,  but  are  from  the  same 

transmitting  source.  Let  y  =  S  ,  +  N  ,  and  u  =  S  ,  +  N  , 

®  ■'n  nl  nl  n  n2  n2 

denote  the  actual  inputs  to  the  two  antennas.  Let  k  be  a  fixed 

integer  and  set  4'  =  (u  ,...,u  '  •  It  is  desired  to  find  the 

n  n  n  *  K 

weight  vector  X  which  is  the  minimizing  X  in  the  expression 
2 

E  [y^  -  •  The  motivation  behind  this  desire  is  that  (roughly 

speaking)  if  the  power  (in  the  communication  theory  sense)  in  the 
(Nni)  sequences  is  greater  than  that  in  the  sequences,  then 
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under  reasonable  conditions  the  ratio  of  signal  to  noise  power  in  the 
"output"  difference  sequence  essentially  the  inverse 

of  that  in  the  input  sequence  This  is  obviously  a  desirable 

result.  See  [5]  for  the  simple  calculation,  along  with  a 
discussion  of  some  useful  applications. 

The  algorithm  (6.2)  is  often  used  to  calculate  the  optimum  X 
recursively,  when  a^  =  a.  But,  in  this  context,  (6.2)  is  not  well 
understood.  Usually,  it  is  only  proved  that  EX^  converges. 
Exceptions  to  this  are  the  work  of  Davisson  [7]  (with  m-dependent 
stationary  Gaussian  sequences  as  inputs)  and  Senne  [6]  (where  the 
stationary  inputs  satisfy  a  type  of  mixing  condition)  ,  where  it 
is  proved  that  E|X^-Xl^  -*-0  as  a  ->■  0 .  The  method  of  Section  8 
exploits  the  technique  of  the  last  section  in  order  to  get  a  more 
complete  picture  in  the  general  case  where  a  is  small  and  the 
processes  are  non- stationary ,  an  important  case  which  actually 
justifies  the  use  of  the  adaptive  algorithm,  but  which  has  not 
yet  been  dealt  with  in  the  literature. 


The  Non-Stationary  Identification  Problem 


In  this  section,  the  parameter  9  in  (6.1)  is  allowed  to 

vary  with  time,  and  we  let  9®  denote  its  value  at  time  n.  Since 

the  variations  in  {9^}  affect  the  statistics  of  (V'^},  the 

n  n 

identification  problem  is  more  complicated  than  the  adaptive 

communications  problem,  and  we  consider  only  the  former  case. 

Assume  that  (u  ,y  }  are  bounded.  Non-stationarities  due  to  the 
n  n 

9®  variations  are  more  difficult  to  treat  than  the  effects  of 
n 


-22- 


non- stationary  {u^,y^}.  In  order  to  concentrate  on  the  more 

important  effects  and  minimize  the  notation,  we  assume  that  {u  } 

n  n 

is  stationary.  Also  is  assumed  to  be  zero  mean,  and  in¬ 

dependent  of  {Uj^}  and  Eu^  =  0. 

We  now  model  the  time  variations.  Let  6(‘)  denote  a  uni¬ 
formly  continuous  valued  function  on  [0,«>),  with  values 

in  a  bounded  set  S.  Suppose  that  the  parameter'*'  9^  takes  the 
value  9 (an) .  To  see  the  reasonableness  of  the  model  note  that 
the  rate  of  change  of  the  9^  must  go  to  zero  in  some  sense  as 
a  ->■  0,  for  otherwise  tracking  would  not  be  possible.  6{‘)  could 
be  a  random  process,  but  no  generality  is  gained  by  that?  since 
we  treat  one  sample  function  at  a  time  anyway.  The  uniform 
continuity  condition  is  used  to  assure  that  the  sequence  has 

a  certain  stability  property  on  [0,<»).  We  want  to  avoid  9(*) 
getting  "wilder  and  wilder"  as  t  -►  «>.  It  is  not  needed  if  we  are 
concerned  with  some  "finite"  interval  {n:  na  <  T}  only. 

a 

We  could  allow  to  be  a  random  sequence  for  each  a. 

Even  then,  its  rate  of  change  must  still  be  proportional  to  a  in 
some  sense  (or  to  a  fractional  power  of  a;  but  then  the 
terms  play  no  role  in  the  limit  as  a  -►  0)  .  In  any  case,  we  want 
an  Clijnit)  equation  which  yields  the  limit  of  the  behavior  of  the 
normalized  interpolation  of  the  error  process  in  terms 

of  the  limit  of  the  parameter  process,,  so  that  the  precise 
relationship  can  be  seen.  Our  scheme  is  a  natural  way  to  get  this. 


^The  parameter  9®  is  the  value  of  (c ^  , .  .  .  ,  Cj^ , bQ . b^)  '  at 

time  n.  Then  the  Cj,bj  are  components  of  (hence  functions  of) 

9^  at  time  n. 
n 
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The  main  object  is  to  get  some  information  on  the  properties 
of  when  a  is  small.  We  might  be  interested,  for  example, 

in  an  approximation  to  the  distribution  of  some  continuous  function 
of  5  T}.  To  get  this,  it  makes  sense  to  parametrize  the 

problem  so  that  we  can  get  a  limit  result  (as  a  -►  0)  which  will 
serve  as  the  approximation  to  the  from  which  the  approxi¬ 

mation  to  the  distributions  of  functions  can  be  obtained  (particularly 
if  the  convergence  is  in  the  sense  of  weak  convergence) .  If  we  allow 
a  -►  0  without  simultaneously  slowing  down  the  rate  of  variation  of 
9^,  then  obviously  no  limit  result  is  possible,  in  general.  Thus, 

to  even  discuss  the  behavior  for  small  a,  we  must  allow  the  9® 

n 

to  depend  on  a.  As  mentioned  above,  there  are  several  ways  in  which 
this  can  be  done.  Our  choice  allows  a  relatively  simple  exhibition 
of  the  structure  that  the  limit  would  have  in  a  wide  variety  of  cases 
Cwhere,  perhaps,  0C')  might  be  a  limit  in  some  sense  of  the 

a  Q 

sequence  of  parameter  variation  functions  9  (•),  where  9  (t)  = 

9j^  on  [an,an+a)). 

The  problem  is  set  up  in  the  next  subsection. 
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The  problem  is  formulated  and  some  terms  are  defined  in 
Subsection  7.1.  Subsection  7.2  obtains  estimates  concerning  the 
dependence  of  /jjC®)  ori  Subsections  7.3  and  7.4  obtain  a 

limit  theorem  for  the  interpolation  of  a  deterministic  centering 
sequence  {Y^} ,  and  tightness  of  t 

subsection  7.5,  the  c”(a)  are  approximated  by  an  exponential 
function  and  in  Subsection  7.6,  Theorem  5  gives  the  appropriate 
Wiener  process  limits  and  the  convergence  theorem  fiDr  {U  (•)}■ 


7.1.  Formulation  of  the  problem.  Let  {y^C^) }  denote  the 
output  and  output -input  sequence  when  0^  =  y  for  all  n;  then  ~ 

{-yn+i(^) . 'yn-k^®^ '^n’ '  *  • below,  these  sequences 

are  second  order  stationary  for  each  6  e  S.  Define 

R(0)  =  Ef  ^  (y  )  I  (y ) ,  =  R(y(t))  and  R^  =  the  true  co- 

variance.  Set  Y  =  X  -  y^,  66^  =  yCan+a)  -  9  (an)  ,  gf  = 

n  n  n  n  ’  "^n 

and  F(y)  =  Except  for  the  above  defined  terms,  the 

superscript  "a"  will  normally  be  omitted  for  notational  convenience, 

in  particular  on  Y  ,X  and  Y  ,Y  below.  We  have 

^  n  n  n  n  n 


Vl  ■  Xn  •  *  “n  '  K*n»n 


V  -  Y  -  66^  -  aR*Y  ♦  aB^Y  *  ap 
n+1  n  n  n  n  n  n  ^n’^n 


Define  the  sequence  ^ 
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C7.1a)  -  60®  -  aR®Y^  +  aF^,  Y.  =  Y  , 

n+1  n  n  n  n  n  0  0 


:a  ^ 


and  define  Y  =  Y  -  Y  .  Then 
n  n  n 


(7.1b) 


Y  =  Y 
’'n+1  n 


n  n 


Y  is  the  "noiseless"  part  of  Y  ,  and  contains  the  effects  of 

the  initial  conditions.  It  is  most  convenient  to  work  with  the  form 

Y  =  Y  +  Y  .  This  avoids  the  requirement  of  Section  3  (due  to 
n  n  n 

the  effects  of  the  initial  condition)  that  n  >  N  .  Finally, 

“  d 

define  {U®}  =  {Y^//a},  the  sequence  whose  convergence  we  will 
ultimately  deal  with. 

In  order  to  exploit  the  stability  properties  of  (6.1),  it 
is  convenient  to  work  with  (6.1)  in  state  variable  form.  To  set 
this  up,  define  u^  =  (u^ . %-Jl^  '  ®^‘^  ^n  ^^n-k’ *  ‘  * ’^n- 1^  '  ‘ 

o 

Recall  that,  by  definition,  0^  =  0  (an)  =  value  of  (c ,  .  . .  ,Cj^, 

b^ . bjj^}'  at  time  n,  a  (k  +  A  +  1)  vector.  For  any  S  valued 

parameter  0,  we  define 


0  1  0 

0 

0 

A(0)  = 

• 

• 

1 

-cj^(e) . -cj(0) 

,  B(0)  = 

0 

bo(e) . b^(0) 

,  C  = 

0 

1 

D  =  [0, ... ,0,1] . 
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We  define  A.  =  A(e^) ,  B.  =  B(e^) .  Then  Z.^,  =  A.Z.  + 

I  ^±^1  1^1  ii 

®i^i  ■"  °^i  +  l‘  Define  Z^(0)  =  ^  »  •  •  • ’^n- 1  ^  ^  • 

Then 

(7.2)  Zi>i(»)  =  A(e)Z.(6)  +  B(e)u.  +  cu.,  y.(e)  =  DZ.^^(e). 

Write  for  the  expectation  conditioned  on  i  <  n. 

The  following  assumptions  are  required. 

(Bl)  |a”(0)  I  -»■  0  as  n  uniformly  for  6  e  S. 

OO  00 

(B2)  I  -  R^)l=  I  l^n^n^  bounded  uniformly  in 

i=n  i=n 

n,w. 

(B3)  There  is  a  >  0  such  that  R(6)  -  q^I  is  non¬ 
negative  definite,  for  all  9  e  S. 

OO  00 

(B4)  I  l^n^'^^i^i  ■  ^i^  I  “  ^  l^n^i^  bounded  uniformly 

i=n  i=n 

in  n,<*) . 

(B2)  and  (B4)  are  not  restrictive.  Under  (Bl) ,  they  hold 
under  a  il>-mixing  condition  (see  [10]  for  the  definition)  on 
{Uj^,)jj^}  with  I  <  OO.  (B4)  holds  if  the  (t^i^  mutually 

independent . 

7.2.  Some  preparatory  estimates.  Since  the  statistics  of 
the  '*'^(6)  are  easier  to  get  than  those  of  the  we  show  that 

can  be  well  approximated  by  'l'^(0^),  uniformly  in  i,  for 
small  a. 
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Let  P(9)  denote  the  unique  symmetric  positive  definite  Liapunov 
matrix  satisfying  A' *  PC®)  =  -!•  CBl) ,  there  are 
0  <  <  P2  "  such  that 


(7.3) 


Pj^I  <  P(0)  <  0  e  S. 


We  now  obtain  a  series  of  results  concerning  the  closeness  of 

RCeJ)  ^n+l* 

the  value  obtained  from  (7.2),  when  the  0  in  (7.2)  is  held  fixed 
at  0^  for  all  i  (i.e.,  A(0)  =  A^,  B(0)  =  B^) .  We  can  write 


(7.4) 


'n+1 


n 

I 


For  small  a,  the  sum  is  (7.4)  converges  uniformly  by  virtue  of 
the  stability  assumption  and  its  consequence  (7.3).  Indeed,  by 
(7.3)  and  the  fact  that  |A(0^^l)  -  A(0®) |  -0  uniformly  in  n  as 
a  0,  there  are  >  0,  e  >  0,  <  »,  such  that  (see  also  [11]  for 

a  similar  estimate) 

(7.5a)  |A„  ...  A.,i|  <  KjCl-e)"'j,  all  n,j,  for  a  <  a„. 

By  (Bl) ,  we  can  suppose  that  Kj^,e  ,  are  chosen  such  that  (7.5b)  also 
holds . 


(7.5b)  |Aj|  <  Kj(l-e)j. 

The  following  approximation  result  is  the  basis  of  much  of  the 
rest  of  the  development. 

+ 

Recall  that  R(®n)  is  the  covariance  i  ^^n^  ’  i.e.,  the 

parameter  0  is  held  fixed  at  0  =  0  . 
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Lemma  1 .  Under  the  stability  assumption  (Bl) , 


■>  0  as  a  -*•  0 , 


Proof.  Let  M_  denote  an  integer  whose  specific  value  will  be 
selected  below.  The  variations  in  the  cause  no  problem  in 

the  proof  and  neither  does  {CUj}.  So,  in  order  to  simplify  the 
proof  set  E  B,  a  constant,  and  C  =  0;  i.e.,  the  b^  components 

of  *^(0  are  constant.  Then 


(7.6) 


j=-oo  a  di 


"■”a  M  ■'-“a-j 
♦  I  |(A„)  “CA  )  Bu 

j  s  »  oo  •p 


+  I  \k  ...  k.,  -  k 
H  i+1  n 

j=n-M  +1  •' 

51 


By  (7.5),  there  is  a  real  K  (not  depending  on  a  or  M_) 
such  that  the  first  two  terms  of  (7.6)  are  each  bounded  in  norm  by 
KCl-e)  /e.  We  will  next  get  a  bound  on  the  third  term.  In  (7.6), 
the  value  of  the  time  parameter  n  plays  no  special  role  and  it 
is  enough  for  us  to  show  that  (7.7)  tends  to  zero  uniformly  in 
Aq  =  A(0  (0)  )  as  a  -►  0 . 

M  -1 

(7.7)  f  lA.  ...  A.  -  Aj*^| , 

j=0  ^  J  u 

In  (7.7),  Aj  takes  the  form  A^  =  Aq  +  6^,  and  all  that  we  assume  on 
is  that  there  is  a  real  Kq  such  that 
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l^il  <  KglOCia)  -  9(0)|.  It  is  convenient  to  work  in  a  matrix 
norm  I'lg  which  might  depend  on  6(0)  but  where  there  are  real 
independent  of  6(0),  such  that  1‘1  <  1*1^  <  K2|-| 

In  particular  define  |A|^  *  sup  x'A'P(6 (0))Ax.  Then  |Aq|q  <  1 


X 


By  (7.3),  K2,K2  exist  and  we  can  also  suppose  that  l*^ilo  ^ 


K2l6(ia)  -  6(0)1.  For  the  term  of  (7.7)  in  the  1*|q  norm,  we  have 


-  Aj*l|„  <  |A„lj 


*  I^IV*  .  1.  i«i/i  lo  ‘  •••  • 


i2>lj  ‘1  2 


Let  A  =  K  sup  |6Cs) 

2  s<M  a 
-  a 


-  6(0)|.  Then  a  crude  upper  bound  on  the 


above  is 


l^olo  ^^<^1^7^)  +  *  •••  * 


<  IAqIJ^Vi  *  tA-)^  -  i> 


Ts: 


0 ' 0 


and  C7.7)  satisfies 


C7.8)  la 


M,-l 


Now  choose  M  -*■  *  as  a  0  in  such  a  way  that  aM  (hence  A) 

cL  3i 

goes  to  zero.  Then,  since  sup  |A(8(0))L  <  1,  the  right  side  of 
M-  6(0)  0 

(.7.8)  and  (1"^)  both  tend  to  zero  uniformly  in  6(0)  e  S,  as 
a  >  0.  Q.E.D. 


-29- 


Similar  proofs  yield  the  following  corollaries. 


Corollary  1.  Assume  (Bl)  .  Let  M^a  0  and  -*■  "  a^ 
a  0  and  let  0  denote  the  set  f6ful:  na-Ma<u<na  +  M.a}. 
Then 


(7.9) 


sup 
0  e0 


Z  (9^) 

n+1^  n'^ 


'n+1 


(9)1  -  0 


as  a  ->  0,  uniformly  in  n. 


Corollary  2.  Assume  (Bl)  ,  (B3)  .  Then  lR(9®)  -  R®(  0 

uniformly  in  n  £s  a  ->■  0 .  Also  there  are  K  <  «>,  >  0  such 

that  for  n  >  j 

(7.10)  |(I-aR(0^))(I-aRC9^.l))  ...  (I-aRC9^^l) )  I  <  KCl-aeQ)"-J 


for  all  n,j  and  small  a.  The  function  R^  =  R(0(t)) 
continuous .  The  function  F(*)  is  continuous  and  -►  F(0(t)), 
uniformly  in  t ,  £s  a  -►  0 ,  n  -*■  ®  iJF  an  is  held  equal  to  t . 

Proof.  The  second  assertion  is  a  consequence  of  the  continuity 
of  9(*)»  and  (B3)  and  (7.9).  The  rest  are  consequences  of  Lemma  1, 
and  Corollary  1  and  the  details  are  omitted. 

7.3.  A  limit  theorem  for  {7^}.  We  next  turn  to  the  treat¬ 
ment  of  the  deterministic  sequence  {Y^}.  Let  4>(t,s),  t  >  s, 
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denote  the  fundamental  solution  of  the  linear  equation  i 

and  let  Y  (•)  denote  the  piecewise  constant  function  on  [0,®) 

with  values  Y^(t)  =  on  [an,an+a),  n  >  0. 


Lemma  2 .  Assume  (Bl),  (B3) .  Then  is  uniformly 

bounded .  If  the  Rj  iji  (7.12)  are  replaced  by  R(B j) ,  then  the 

difference  between  and  the  new  right  hand  side  converges  to 

zero  uniformly  in  n,  a£  a  -*•  0.  ^  a  -*-0,  Y  (•)  converges 

uniformly  on  bounded  intervals  to  the  function  T(*)  defined  by 

(7.11a)  Y(t)  =  ♦(t,0)Y(0)  -  f  ♦(t,s)d0  +  f  4>(t  ,s)F(0  (s)  )ds 

J  0  ^  J  0 

=  <Kt,0)Y(0)  -  ♦(t,O)(0(t)  -  0(0)) 

-  f  <I>(t,s)R  (0  (t)  -  0  (s))ds 

J  0  ^ 

+  f^$(t,s)F(e(s))ds, 

J  0 

which  is  the  unique  solution  to  the  equation 

(7.11b)  dY(t)  =  -R^Y(t)dt  -  d0(t)  +  F(0(t))dt. 

Proof.  For  the  first  assertion  we  write  the  solution  to 


(7.1a)  in  the  form  (using  a  summation  by  parts  to  get  the  second 
equation) 
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=  n(I-aRf)Y.  -  n  (I-aRf)  [bCan+a)  -  6(0)] 
i=0  ^  "  i=l  ^ 

-  I  a  n  (I-aK®)Rfie(na+a3  -  e(ia)l  + 
i=l  j=i+l  J  ^ 


+  I  n  (I-aR®)aFf. 
i=0  j=i+l  J  ^ 


Now  use  Corollary  2  together  with  the  boundedness  of  6(*)» 

The  second  assertion  follows  from  Corollary  2.  The  last 
assertion  then  follows  by  letting  a  ■*  0,  n-*-“>,  an  =  t,  in  (7.12) 
and  noting  that  for  t  >  s 

n  (I-aR(e^))  -*-<l'(t,s),  t  >  s 
i=m3(s)  ^ 

uniformly  on  bounded  s,t  intervals.  Q.E.D. 

7.4.  Tightness  of  •  With  the  preparatory  results 

available,  we  proceed  to  the  main  result,  by  following  the  pattern 
of  development  in  Theorem  1. 
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Theorem  5.  Under  (B1)-(B4),  {U^,  n  >  0,  small  a}  is  tight 
In  particular  (since  Yp  =  Up  =  0),  E|Y^|^  <  Ka. 

Proof.  The  proof  is  quite  similar  to  that  of  Theorem  1  and 
we  only  remark  on  the  basic  set  up.  The  Liapunov  functions  of 
Theorem  1  will  be  VCy)  “  “1/1  » 


vjcy,t„)  ■  2y'  .I_^E„e“y  .  2y’  .  2y' 


v*Cy,t„)  -VCy)  •  aV*Cy,t„) 


By  virtue  of  (B2)  and  (B4) ,  the  sums  are  uniformly  bounded  and, 
as  required  by  Theorem  1, 


(7.13) 


|V®Cy,t^)|  <  K(VCy)  +  1) 


Now,  by  applying  the  mechanisms  of  the  proof  of  Theorem  1 
and  using  the  boundedness  of  {jY^l)  yields 


C7.14)  -  V^y„,t^)  <  -aYX^„  *  Ka^l.|Y„|2). 


Since  is  positive  definite,  uniformly  in  (small  a)  (Corollary  2 

and  (B3)),  there  is  a  Y  >  0  such  that  Y'R^Y  >  yV(Y  )  and  the 

n  n  n  -  n 

method  of  Theorem  1  (together  with  the  uniform  positive  definiteness 
of  Rj^)  yields  the  desired  tightness.  Q.E.D. 
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7.5.  Approximating  C^Ca)  by  an  exponential.  Recall  the 

function  C^(a)  introduced  below  (5.3).  Here  V'  is  the 
X*  n  n 

of  (5.3).  We  can  write 

(7.15) 


The  estimate  (7.16)  is  needed  in  the  proof  of  the  next  theorem. 
By  (B2)  ,  (B4) ,  (the  limits  of  the  sums  are  m  (s) ,m  (t)-l) 

a  3. 


lEi^l  I  5  K(t-s)/a. 

j>i 


By  this  estimate  and  Chebychev's  inequality  there  is  a  real  K 
such  that 

ma(t)-l 

(7.16)  P{a|  I  >  e}  <  Ka(t-s)/e^. 

i=m^(s)  ^  “ 


uniformly  on  bounded  s,t  intervals  if  a  -►  0  fast  enough;  in 


)articular.  through  any  sequence  (a,  }  where  I  a,  <  «>. 

K  ^  K 


Proof.  The  proof  is  very  similar  to  that  of  Theorem  2, 

Part  2.  First,  fix  t  <  T,  let  M  denote  an  integer,  and  divide 
[0,t]  into  M  intervals,  each  of  width  6.  Suppose  (without  loss 


of  generality)  that  N  =  6/a  is  an  integer  and  6  <  l.  The 

constants  K  below  do  not  depend  on  a, 6  or  on  t  <  T,  and 
their  values  may  change  from  usage  to  usage.  We  have 


C7.17)  holds  (with  the  same  K)  when  0  and  N  -  1  are  replaced  by 
iN-N  and  iN,  resp.,  for  any  i  >  0.  Hence, 


m  rt) 


NM-1 

(a)  -  (I  +  a  I  H®) 
j=NM-M  J 


N-1  , 

.  (I  +  a  I  H®)1  <  K6, 

j  =  0  J 


Let  satisfy  I  aj^  <  «.  Next,  we  want  to  show  that 


I  NM-1  ^  N-1  , 

(7.19)  (I  +  a  I  H^)  ...  (I  +  a  I  H^)  - 
I  j=NM-M  ■'  j  =  0  J 

NM-1  .a  N-1^  I 

(I  -  a  I  R^)  ...  (I  -  a  I  R^)  ^  0 

j=NM-M  ^  j=0  J  I 

uniformly  for  t  e  {i6;  i  <  T/ 6}  w.p.l,  as  a  -*■  0  through  the 

sequence  for  each  fixed  6  >  0.  Owing  to  the  fact  that  both 

m^(T+u)-l  , 

a  -  m  (T+u) 

products  (I  +  a  I  Hf)  and  (a)  can  be  made 

l-nijCi)  ^  "a'’’ 

arbitrarily  close  to  the  identity  by  letting  u  and  a  be  small, 
(7.19)  implies  that 
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uniformly  in  t  <  T,  w.p.l,  as  a  -*■  0  through 
(7.19),  we  use  the  estimate 


M  iN-1  „ 

(7.20)  1(7.19)1  <  Ka  I  1  I  6®  1  H  E„(a) 

i=l  j=iN-N  J  ^ 


and,  by  using  (7.16)  with  6  =  (t-s)  and  M  =  T/6, 


M  iN-1 

P{EnCa)  >  6}  <  I  P{a|  I  rA  >  e6/T} 
°  ''1=1  j*iN-N  J 


< 


K(-^)a, 


Thus  Z  P{E„(a,)  >  e}  <  00  and  the  Borel-Cantelli  Lemma  and  (7.20) 
k  u  k  - 

implys  (7.19)  . 

Finally,  use  the  fact  that  by  Corollary  2,  (7.19)  remains 
true  when  Rj  is  replaced  by  R(9(aj))  =  R(9j)  and  the  fact 
that 


NM-1 

n  (I-aR(e(aj)))  4.(t,0) 

jEO 

uniformly  in  [0,T],  as  a  -►  0,  to  complete  the  proof.  Q.E.D. 

7-6.  The  Wiener  process  and  the  limit  theorem  for  {U*'(*)} 
Define  U  (•)  as  in  Section  4  and  define  W®  and  T®  by 


i=0  ^  ^  ^  i=0  ^ 
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and  let  W^(»)  and  r^(')  be  the  continuous  parameter  processes 
with  values  and  r^,  resp.,  on  [an,an+a).  By  solving 

C7.15)  and  doing  a  partial  summation,  we  get  (5.4),  but  where  U 
is  replaced  by  0  and  all  N's  are  deleted.  The  limit  result  is 
given  in  Theorem  5  under  the  additional  assumptions: 

(B5)  (U.) 

2  2 

distributed  random  variables  with  and  Em ^  = 


sequence  of  bounded  independent  and  identical T 


(B6)  {Uj^}  is  a  bounded  <|> -mixing  process  [10]  with 

1/2 

mixing  rate  satisfying  I  <  “. 


Remark  on  (B5)-(B6).  They  are  stronger  than  necessary. 

(B5)  is  used  because  otherwise  F(®)  ^0  and  it  seems  pointless  to 
get  a  limit  theorem  for  U®’(*)>  when  the  Y(-)  itself  is  biased 
by  F(6(*))-  Also,  (B5)-(B6)  imply  (B2)  and  also  that  (B4)  is 
zero. 


Z  to 
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A  lc+  +  1 

{U  (•)}  converges  weakly  in  D  [0,“>)  to  the  diffusion  U(’) 

given  by 

(7.23)  dU  =  -R^Udt  +  dW  +  dr . 

Remarks .  (7.22a-c)  are  well  defined.  The  sequence 

and,  for  each  9,  ,  3j^(9) ) ,  are  stationary  processes,  so  the 

subscript  0  and  I  in  (7.22b,c)  could  be  i  and  i  +  £,  resp., 
for  any  i.  These  expressions  are  calculated  by  first  calculating 
the  asymptotic  moments  of  {4'j^(9)}  needed  in  (7.22)  for  each  9. 
These  are  continuous  functions  of  9 ,  so  (7.22)  makes  sense.  Note 
that  the  covariance  "increment"  at  t  depends  only  on  the  para¬ 
meter  9 (t) ,  the  desired  form.  Compare  (7.22)  to  the  R  below 
(5.1).  They  are  equivalent  if 

we  use  f(9,^j)  ~  neither  the  parameters  nor  Yj 

vary  with  time. 

The  exact  values  of  the  covariances  are  complicated  and  one 
would  not  normally  want  to  calculate  them  -  even  for  some  known 
"test"  variation  9(»).  Theorem  5  gives  the  structure  of  the  limit 
and  indicates  how  the  variances  depend  on  the  unknown  function. 
This,  in  itself,  is  useful. 

Proof.  Once  the  assertions  concerning  convergence  to  the 
Wiener  process  are  shown  the  proof  is  completed  as  indicated 
below  (5.6)  for  Theorem  2.  Only  the  assertions  concerning  the 
Wiener  processes  will  be  proved.  The  proof  of  those  assertions 
are  based  on  the  proof  of  similar  assertions  in  Theorem  2  and  in 
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(II],  Theorem  2).  The  main  changes  are  due  to  the  non-stationarity, 
which  requires  altering  (A9)-(A11)  (resp.,  (A6) - (A8)  of  [1]). 

In  our  non- stationary  and  bounded  case,  (AlO)  and 

(All)  should  be  replaced  by:  Let  h^  =  or  8^  and  define 


(7.24a) 


pf(i)  -  0.  i  > 

J 


0, 


(7.24b) 


P2(i)  =  sup|Ej^h®^^|  ,  i  >  0. 
Ic 


Then 


(7.24c)  I(Pl(i))^/^  ♦  I(P2Ci))^^^  <  » 


where  the  sums  converge  uniformly  in  a. 

By  the  independence  of  (Uj^),  (7.24)  is  obvious  for 
^i  *  ^i*  sequence  {<1;^}  is  <|» -mixing  with  the  corresponding 

{•(•j)  satisfying  I  <  *,  because  of  (B5)-(B6)  and  the  fact 

that  there  are  linear  F^COwith  uniformly  (in  n,q)  bounded  co¬ 
efficients  and  satisfying  |e^|  <  K(l-e)^  such  that 

Xn  *  fS(% . “n-q'^n’'--’‘^n-q^  "  ^n’  Consequently,  both 

(Vj^}  and  (8^)  are  (fr -mixing  with  the  corresponding  {<1)^}  satisfying 
I  ^  This  implies  (/.24)  for  h®  =  8?.  The 

property  (7.24)  was  used  in([l].  Parts  1,2  of  proof  of  Theorem  2), 
to  show  that  )  /aT  h.  was  tight  and  converged  weakly  to 

- 

In  [1],  mCt)  =  max{n:  I  a.  <  t}  and  a.  -*■  0  as  i  -►“and  J  a .  *  <*> 

0  1  •  1  1 

also  the  superscript  'a*  was  not  used  or  needed.  But  the  proof  can 

also  be  used  for  our  case,  since  only  (7.24c)  was  used. 


-39- 


_ — •  _ _ , 


m(t^^t)-l  ^ 

a  continuous  martingale,  and  that  (  ^  /i”  h. |  is  uniformly 

mCtn) 

integrable  in  N.  The  same  proof  can  be  used  when  a^  =  a.  Thus, 

{W^  ( • )  . (  • ) }  are  tight  in  [  0  ,®)  and  all  weak  limits  are 

continuous  martingales  and  {lW^(t)|^,  |r^(t)l^,  small  a}  is  uni¬ 
formly  integrable  for  each  t. 

Choose  and  fix  a  convergent  subsequence  and  index  it  by  n, 
and  let  W(-),r(’)  denote  the  limit.  As  we  will  see,  the  limit 
will  not  depend  on  the  subsequence.  Let  q  be  an  arbitrary  integer,  anc 
s^,  i  <  q,  t,s  arbitrary  except  that  s^  <  t  <  t  +  s,  and  let  g(')  be 
a  bounded  continuous  function.  Let  denote  (t)  '  weak 

convergence  and  uniform  integrability, 

(7.2  5)  Eg(W^(sp,r®(sp,  i  <  q)E^[r^(t+s)  -  r^(t)]  [r^(t+s)  -  r®(t)]' 

Eg(W(sp,r(sp,  i  <  q)  [r(t+s)  -  rct)][rct+s)  -  r(t)i'. 

Evaluating  the  E^[  ]  term  and  using  the  independence  of  the 

{y.},  yields  (limits  of  the  sioms  are  m  (t),m  (t+s)-l) 

1  cl  3- 

C7.26)  E^ir®(t  +  s)  -  r^(t)]  [r^(t+s)  -  r^(t)]'=  a  E^  I  Yi(Y^)' 

=  a  I  OyE^V.(<p.)'. 

Since  lim  -  R^|  0  as  li-m^(t)l  ■>  «>  by  (B5)  ,  (B6)  , 

the  limit  of  the  right  side  is  the  limit  of  which  (in  turn) 

is  the  limit  of  a  I  o^R(9^)  which  (in  turn)  equals 
ft+s  2 

a‘R(6 (^v))dv.  Due  to  the  arbitrariness  of  s.,q,g,s,t,  we  have 
it  ^ 

that 


A _ 
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E{[r(t+s)  -  rct)]irct+s)  -  rct)]'l  r(v),w(v),  v  <  t}  = 

=  oj;R(e(v))dv, 

hence  that  the  right  side  of  C7.22a)  is  the  quadratic  covariation 
of  r(*)*  Thus  r(*)  is  a  Wiener  process.  Similarly,  if  the 
right  sides  of  (7.22b,c)  are  the  quadratic  covariation  of  W(*) 
and  the  cross  quadratic  covariation  of  W(Ofr(*)»  then  (W(.). 
r(*))  is  the  asserted  Wiener  process,  and  the  proof  will  be 
completed. 

We  now  do  a  similar  calculation  for  W^'C*)*  We  need  only 
show  that  Clii>*its  of  sums  are  m  (t)  ,m  (t+s)  -  1  unless  other- 

ci  a 

wise  written) 


C7.27) 


aEt  I  B®Y.  I  Y!e^ 


t 


converges  to  the  integral  in  (7.22b)  with  limits  (t,t+s)  in¬ 
stead  of  (0,t).  Equation  (7.26)  equals  (use  the  convention 
b 

1=0  if  b  <  c) 
c 


C7.28) 


ma(t+s)-£-l 

I  aE  B?7.Y'  ,  3^  ^ 
A>0  i=m  (t)  ^  ^  ^  ^ 

—  a 


I 


(t+s)  -1 
a 


a ' 


I  I  aE  3iY  Y’  3“  , 

£<0  i=mg(t)+|Jl|  t  1  1  1  A  1  i 


For  all  i,  i  +  A  in  the  range  of  the  above  sums,  the  -mixing 
implies  that 


► 

i 
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m,(t+s) 


Since 


il|>L  i=m^(t) 


1/2 

I  3*1*10  1  ■*■1^  3S  L  ->  ®,  we  may  evaluate  the  limit 
fti  I  I 


of  (7.28)  by  evaluating  the  limit  of  the  inner  sums  individually  as 
a  -►  0 ,  and  then  summing  over  £.  By  the  same  argument  which  we  used 

a  t  h 

for  r  (•)  below  (7.26),  the  limit  of  the  £  inner  sum  is  the  same 
as  the  limit  when  is  replaced  by  E.  Furthermore,  by  Lemma  1 

and  its  Corollaries,  3®  can  be  replaced  by  without  altering 

the  limit.  Upon  making  these  replacements,  we  see  that  £^  inner  sum 

f*  Vi 

converges  to  the  £’^  integral  in  (7.22b)  with  limits  (t,t+s)  instead 
of  (0,t).  By  the  argument  used  in  connection  with  r(*).  this  implie 
that  WC)  is  a  Wiener  process  with  the  asserted  covariance. 


We  need  only  show  that  (7.22c)  is  the  cross-quadratic  covariance 

between  r(-)  and  W(* )  is  (7.22c).  The  proof  of  this  is  the  same 

00 

as  that  just  given  for  W(' )  above.  The  sum  is  I  rather  than 

00  I 

1  ,  since  y  is  independent  of  y.,  i  <  n,  and  of  and 

-  00  “  ^  ^ 

3^,  i  fn.  Q.E.D. 
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